Latent Topic Model Based on Gaussian-LDA for Audio Retrieval

نویسندگان

  • Pengfei Hu
  • Wenju Liu
  • Wei Jiang
  • Zhanlei Yang
چکیده

In this paper,we introduce a new topic model named Gaussian-LDA, which is more suitable to model continuous data. Topic Model based on latent Dirichlet allocation (LDA) is widely used for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a multinomial distribution over the vocabulary. To apply the original LDA to process continuous data, discretization based vector quantization must be done beforehand, which usually results in information loss. In the proposed model, we consider continuous emission probability, Gaussian instead of multinomial distribution. This new topic model demonstrates higher performance than standard LDA in the experiments of audio retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent topic model for audio retrieval

Latent topic model such as Latent Dirichlet Allocation (LDA) has been designed for text processing and has also demonstrated success in the task of audio related processing. The main idea behind LDA assumes that the words of each document arise from a mixture of topics, each of which is a multinomial distribution over the vocabulary. When applying the original LDA to process continuous data, th...

متن کامل

Supervised acoustic topic model for unstructured audio information retrieval

We introduce a modified version of the acoustic topic model, which assumes an audio signal consists of latent acoustic topics and each topic can be interpreted as a distribution over acoustic words, for unstructured audio information retrieval applications. The proposed supervised acoustic topic model is based on supervised latent Dirichlet allocation (sLDA) while the conventional acoustic topi...

متن کامل

Study of entity-topic models for OOV proper name retrieval

Retrieving Proper Names (PNs) relevant to an audio document can improve speech recognition and content based audio-video indexing. Latent Dirichlet Allocation (LDA) topic model has been used to retrieve Out-Of-Vocabulary (OOV) PNs relevant to an audio document with good recall rates. However, retrieval of OOV PNs using LDA is affected by two issues, which we study in this paper: (1) Word Freque...

متن کامل

Multi Domain Semantic Information Retrieval Based on Topic Model

Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important informa...

متن کامل

Tensor Decomposition for Topic Models: An Overview and Implementation

The goal of a topic model is to characterize observed data in terms of a much smaller set of unobserved topics. Topic models have proven especially popular for information retrieval. Latent Dirichlet Allocation (LDA) is the most popular generative model used for topic modeling. Learning the optimal parameters of the LDA model efficiently, however, is an open question. As [2] point out, the trad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012